Extraction tools for collocations and their morphosyntactic specificities

نویسندگان

  • Julia Ritz
  • Ulrich Heid
چکیده

Abstract We describe tools for the extraction of collocations not only in the form of word combinations, but also of data about the morphosyntactic properties of collocation candidates. Such data are needed for a detailed lexical description of collocations, and to support both their recognition in text and the generation of collocationally acceptable text. We describe the tool architecture, report on a case study based on noun+verb collocations, and we give a first rough evaluation of the data quality produced.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Morphosyntactic Preferences in Collocations

In this paper, we describe research that aims to make evidence on the morphosyntactic preferences of collocations available to lexicographers. Our methods for the extraction of appropriate frequency data and its statistical analysis are applied to the number and case preferences of German adjective+noun combinations in a small case study.

متن کامل

Tools for Collocation Extraction: Preferences for Active vs. Passive

We present and partially evaluate procedures for the extraction of noun+verb collocation candidates from German text corpora, along with their morphosyntactic preferences, especially for the active vs. passive voice. We start from tokenized, tagged, lemmatized and chunked text, and we use extraction patterns formulated in the CQP corpus query language. We discuss the results of a precision eval...

متن کامل

Towards a corpus-based dictionary of German noun-verb collocations

We 1 describe our attempts to automatically extract raw material for a dictionary of German noun-verb collocations from large corpora of newspaper text. Such a dictionary should be about collocations and it should include a description of their linguistic properties, rather than listing the mere lexical cooccurrence. Since most statistical collocation nding tools do not provide other than lexic...

متن کامل

Identification of Noun-Noun (N-N) Collocations as Multi-Word Expressions in Bengali Corpus

Noun-Noun compounds, as a subset of Compound Nouns as well as Nominal Compounds play an important role in NLP applications like Machine Translation, Information Retrieval because of the token frequency, type frequency and their occurrence in the world’s languages. Recognition of MWEs requires deep or shallow syntactic preprocessing tools and large corpora. The problem is quite difficult in Beng...

متن کامل

Collocation Extraction: Needs, Feeds And Results Of An Extraction System For German

This paper provides a specification of requirements for collocation extraction systems, taking as an example the extraction of noun + verb collocations from German texts. A hybrid approach to the extraction of habitual collocations and idioms is presented, aiming at a detailed description of collocations and their morphosyntax for natural language generation systems as well as to support learne...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006